The Perils of Stepwise Logistic Regression and How to Escape Them Using Information Criteria and the Output Delivery System

نویسندگان

  • Ernest S. Shtatland
  • Emily Cain
  • Mary B. Barton
  • Harvard Pilgrim
چکیده

In this presentation, which is a continuation of our NESUG’2000 paper, we demonstrate that using SAS® stepwise logistic regression with the default and most typically used value of significance level for entry (SLENTRY) of 0.05 may be unreasonable and sometimes even dangerous because it results in the model that on one hand has usually too many variables for a reliable interpretation and on the other hand too few variables for a good prediction. Users who blindly rely on stepwise logistic regression will most likely get a rather poor choice for both purposes: interpretation and prediction. The recommendations of using critical p-values other than the default often look vague and even contradictory. We propose to resolve this problem by using the Akaike and Schwarz information criteria (which are standard components of the PROC LOGISTIC output), some elements of Bayesian reasoning, and capabilities of ODS (Output Delivery System) which are available in PROC LOGISTIC in SAS version 8. We also discuss the problem of improving the model selection process by taking into account model selection uncertainty. The intended audience: SAS users of all levels who work with SAS/STAT® and PROC LOGISTIC in particular. THE PROBLEMS WITH MODEL SELECTION Model selection is a fundamental task in data analysis, widely recognized as central to good inference. In SAS PROC LOGISTIC, we have 4 automatic model selection techniques: forward selection, backward elimination, stepwise selection which combines the elements of the previous two, and the best subset selection procedure. The first three methods are based on the same ideas and we will talk only about stepwise selection as more flexible and sophisticated selection procedure. This choice is subjective, some researchers prefer to work with backward selection. Typically, the final model selected by each of these procedures will be the same, but it is in no way guaranteed. Stepwise selection is intuitively appealing: it builds models in a sequential manner and it allows for the examination of a collection of models which might not otherwise have been examined. The best subsets selection method which is invoked with the statement SELECTION = SCORE is not as popular as forward, backward, and stepwise selections because it can compare only the models of the same size (with the same number of covariates). However, we will show how the best subset selection method can be very useful in the final step of our procedure in reducing model selection uncertainty. Purposeful selection which combines subject measure knowledge with statistical significance considerations can be performed only when we have a small number of models to compare originally, or at some advanced step of selection when a small number of covariates has been left. It is worth noting that if we have 10 covariates , the number of all possible models is 2 =1024. With 20 covariates we have more than 1,000, 000 possible models, and with 30 covariates the number of possible models is greater than 1,000,000,000. Thus, even with rather moderate numbers of covariates we cannot do without stepwise selection. The stepwise technique allows us to decrease drastically the total number of models under consideration and to produce the final model. The final result will depend substantially on the 2 parameters: SLENTRY (the significance level for entering) and SLSTAY (the significance level for stay). If the values of these parameters are not specified, the SAS system uses default values of 0.05 for both. This default value of the significance level is used more often than not without any grounds, just because of an unwritten statistical tradition which says: if you do not have strong personal opinions on this matter, then use 0.05. SLENTRY=0.05 does not mean that the overall significance level is 0.05, it is usually much larger than 5%. One way to deal with this problem is to specify a very small SLENTRY (see, for example, Posters

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credit Risk Measurement of Trusted Customers Using Logistic Regression and Neural Networks

The issue of credit risk and deferred bank claims is one of the sensitive issues of banking industry, which can be considered as the main cause of bank failures. In recent years, the economic slowdown accompanied by inflation in Iran has led to an increase in deferred bank claims that could put the country's banking system in serious trouble. Accordingly, the current paper presents a prediction...

متن کامل

رفتار ارتباطی کشاورزان در استان آذربایجان شرقی

To improve livestock production and to modernize dairy husbandry in Iran, it is essential to disseminate the most recent information on dairy husbandry technologies and management practices through various means among farmers. An understanding of farmers’ communication behaviour is essential in formulating effective communication strategies for livestock development. For the purposes of this st...

متن کامل

رفتار ارتباطی کشاورزان در استان آذربایجان شرقی

To improve livestock production and to modernize dairy husbandry in Iran, it is essential to disseminate the most recent information on dairy husbandry technologies and management practices through various means among farmers. An understanding of farmers’ communication behaviour is essential in formulating effective communication strategies for livestock development. For the purposes of this st...

متن کامل

An Evaluation of the pharmacy information system in teaching hospitals based on the HOT-fit model

Introduction: The pharmacy information system plays an effective role in managing patients' medication information through informing physicians about unsafe medication prescriptions, prescribed overdoses, and possible drug interactions. The present study was conducted with the aim of evaluating the pharmacy information system using the HOT-fit (Human, Organization And Technology-Fit) model in s...

متن کامل

میزان تمایل به سزارین و عوامل مؤثر بر آن در زنان باردار دارای سابقه زایمان واژینال در شهرستان نیشابور

  Background and Objective: C esarean is a life saving intervention for mother and baby in certain circumstances, but it has adverse effects on both. of them. The objective of this study was to investigate pregnant women 's preference rate to cesarean delivery and factors associated with it, with special emphasis on pregnant women 's preference with previous vaginal delivery.   Materials and Me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001